1 Learning objectives

  1. Zoom in ggplot maps with Coordinate Reference Systems (CRS) using the coord_sf() function.

  2. Change the CRS projection of a ggplot map using the crs argument of coord_sf().

  3. Configure data with UTM coordinate system using the st_as_sf() function.

  4. Change the CRS projection of a sf object using the st_transform() function.

2 Prerequisites

This lesson requires the following packages:

if(!require('pacman')) install.packages('pacman')

pacman::p_load(malariaAtlas,
               colorspace,
               ggplot2,
               cholera,
               spData,
               here,
               rio,
               sf)

pacman::p_load_gh("afrimapr/afrilearndata",
                  "wmgeolab/rgeoboundaries")

3 Introduction

From the second lesson, we learned that Vector data require a Coordinate Reference System (CRS) to relate the spatial elements of the data with the surface of Earth. For that reason, Coordinate systems are a key component of geographic objects.

Figure 1. Direction of longitude and latitude.

In this lesson we are going to learn how to manage the CRS of maps by zooming in to an area of interest, set them up to external data with coordinates different to longitude and latitude (called UTM), and transform between different coordinate systems of CRS’s.

4 Data and basic plot

First, let us start with creating a base map of the world from the spData package using ggplot2:

ggplot(data = world) +
    geom_sf()

5 Manage Coordinate systems with coord_sf()

The function coord_sf() from the {ggplot2} package allows to deal with the coordinate system, which includes both the extent and projection of a map.

5.1 “Zoom in” on maps

The extent of the map can also be set in coord_sf(), in practice allowing to “zoom” in the area of interest, provided by limits on the x-axis (xlim), and on the y-axis (ylim).

Here, we zoom in the world map to the African continent, which is in an area delimited in longitude between 20°W and 55°E, and in latitude between 35°S and 40°N. To exactly match the limits provided, use expand = FALSE.

ggplot(data = world) +
    geom_sf() +
    coord_sf(xlim = c(-20, 55), ylim = c(-35, 40), expand = FALSE)

Check which + and - signs are related with the cardinal direction:

  • In longitude: West is -, East is +,
  • In latitude: South is -, North is +.

Zoom in the africountries map to Sierra Leona, which is in an area delimited in longitude between 14°W and 10°W, and in latitude between 6°N and 10°N.

5.2 Change the Projection of a map

The world object have a CRS projection called WGS84 (detailed in the fifth line of the header)

world
## Geometry set for 177 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
## First 5 geometries:

Which corresponds to the EPSG code 4326:

st_crs(world)$input
## [1] "EPSG:4326"

Projection refers to the mathematical equation that was used to project the truly round earth onto a flat surface.

Figure 2. Projection of a CRS.

  • EPSG refers to the European Petroleum Survey Group (EPSG).

  • EPSG is a Spatial Reference System Identifier (SRID) with arbitrary codes available for concrete CRS projections.

  • One of these projections is WGS84, which refers to World Geodetic System 1984.

Using the crs argument of the coord_sf() function, it is possible to override this setting, and project on the fly to any projection.

For example, we can change the current WGS84 projection to the ETRS89 Lambert Azimuthal Equal-Area projection (alias LAEA), which is EPSG code 3035:

ggplot(data = world) +
    geom_sf() +
    coord_sf(crs = 3035)

However, this CRS projection is useful for European countries.

Get into the linked webpage to find the EPSG code that its required:

Change the CRS projection of the ggplot map with the world object to the Pseudo-Mercator coordinate system.

Choosing a Projection / CRS

Which projection should I use?

To decide if a projection is right for your data, answer these questions:

  • What is the area of minimal distortion?
  • What aspect of the data does it preserve?

Take the time to identify a projection that is suited for your project. You don’t have to stick to the ones that are popular.

Online tools like Projection Wizard can also help you discover projections that might be a better fit for your data.

Figure 3. Steps to find a custom projection with Project Wizard.

For instance, to find an appropriate projection for the African continent, you can:

  1. Define your area of interest,
  2. Select a distortion property,
  3. Confirm the map outcome that fits your needs,
  4. Copy the text inside the PROJ option.

Then, paste that valid PROJ string to the crs argument:

ggplot(data = world) +
    geom_sf() +
    coord_sf(crs = "+proj=laea +lon_0=19.6875 +lat_0=0 +datum=WGS84 +units=m +no_defs")

PROJ is an open-source library for storing, representing and transforming CRS information.

Get into the linked webpage to find the PROJ string that its required:

Change the CRS projection of the ggplot map with the world object to the Aitoff coordinate system.

CRS components

A CRS has a few key components:

  • Coordinate System - There are many many different coordinate systems, so make sure you know which system your coordinates are from. (e.g. longitude/latitude, which is the most common);

  • Units - Know what the units are for your coordinate system (e.g. decimal degrees, meters);

  • Datum - A particular modeled version of the Earth. These have been revised over the years, so ensure that your map layers are using the same datum. (e.g. WGS84);

  • Projections - As defined above, it refers to the mathematical equation that was used to project the truly round earth onto a flat surface.

CRS projections

The “orange peel” analogy is useful to understand projections. If you imagine that the earth is an orange, how you peel it and then flatten the peel is similar to how projections get made.

  • A datum is the choice of fruit to use. Is the earth an orange, a lemon, a lime, a grapefruit?

Figure 4. Image of citrus.

  • A projection is how you peel your orange and then flatten the peel.

Figure 5. Image of peeled orange with globe.

Figure 6. Maps of the United States in different projections (Source: opennews.org)

The above image shows maps of the United States in different projections. Notice the differences in shape associated with each projection. These differences are a direct result of the calculations used to flatten the data onto a 2-dimensional map.

  • Data from the same location but saved in different projections will not line up in any GIS software.

  • Thus, it’s important when working with spatial data to identify the coordinate reference system applied to the data and retain it throughout data processing and analysis.

6 Set up a CRS Projection to UTM coordinates

Let’s say that you receive a data frame with coordinates, but without a CRS projection. If you want to make a ggplot map with it, you know that you can use st_as_sf() from the {sf} package.

As we learned in a previous lesson, for point data we need to specify the coordinates and the CRS:

fatalities %>% 
  st_as_sf(coords = c("x","y"),
           crs = 4326) %>% 
  ggplot() +
  geom_sf(alpha = 0.3)

However, what if you receive coordinates different to longitude and latitude?

To exemplify this scenario, we are going to use malaria prevalence in The Gambia. We use data of malaria prevalence in children obtained at 65 villages in The Gambia.

6.1 Malaria prevalence

First, we download the gambia.rda file from internet using its URL path with the {rio} package.

gambia <- rio::import("https://github.com/cran/geoR/raw/master/data/gambia.rda")

as_tibble(gambia)
## # A tibble: 2,035 × 8
##          x       y   pos   age netuse treated green   phc
##      <dbl>   <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl> <dbl>
##  1 349631. 1458055     1  1783      0       0  40.8     1
##  2 349631. 1458055     0   404      1       0  40.8     1
##  3 349631. 1458055     0   452      1       0  40.8     1
##  4 349631. 1458055     1   566      1       0  40.8     1
##  5 349631. 1458055     0   598      1       0  40.8     1
##  6 349631. 1458055     1   590      1       0  40.8     1
##  7 349631. 1458055     1   589      1       0  40.8     1
##  8 349631. 1458055     0   609      1       0  40.8     1
##  9 349631. 1458055     0   791      1       0  40.8     1
## 10 349631. 1458055     1   829      0       0  40.8     1
## # … with 2,025 more rows

It is a data frame with 2035 observations and 8 variables, among them:

  • x: x coordinate of the village (UTM),
  • y: y coordinate of the village (UTM),
  • pos: presence (1) or absence (0) of malaria in a blood sample taken from the child,

UTM stants for Universal Transverse Mercator, another coordinate system.

Aggregate data

Data in gambia are given at an individual level. To get an estimate of the prevalence per village, we need to aggregate the malaria tests by village.

Unique coordinates

Using the dplyr::distinct() function, we can see that gambia has 2035 rows and 65 unique pair of coordinates. This indicates that 2035 malaria tests were conducted at 65 locations.

gambia %>% 
  distinct(x,y) %>% 
  nrow()
## [1] 65

Use group_by() and summarise()

Here we use dplyr to create a data frame called gambia_point_summary containing, for each village the:

  • coordinates (x, y),
  • total number of tests performed (total),
  • number of positive tests (positive), and
  • malaria prevalence (prev).
gambia_point_summary <- 
  gambia %>% 
  group_by(x, y) %>%
  summarize(
    total = n(),
    positive = sum(pos),
    prev = positive / total
  ) %>% 
  ungroup()
gambia_point_summary
## # A tibble: 65 × 5
##          x       y total positive  prev
##      <dbl>   <dbl> <int>    <dbl> <dbl>
##  1 349631. 1458055    33       17 0.515
##  2 358543. 1460112    63       19 0.302
##  3 360308. 1460026    17        7 0.412
##  4 363796. 1496919    24        8 0.333
##  5 366400. 1460248    26       10 0.385
##  6 366688. 1463002    18        7 0.389
##  7 368420. 1460569    38       24 0.632
##  8 370400. 1460301    56        7 0.125
##  9 370628. 1499589    57        6 0.105
## 10 374403. 1501392    26        8 0.308
## # … with 55 more rows

Set up the CRS projection

Now we can plot the malaria prevalence. However, in order to use ggplot2 and geom_sf(), we need to transform the data.frame to an sf object.

To use the st_as_sf() function from the {sf} package, we need to specify a CRS projection. But in this case, the units of the x and y variables are not in Geographic coordinates (longitude/latitude). Instead, these data coordinates are in UTM format (Easting/Northing), also called Projected coordinates.

CRS coordinate systems:

  • Geographic (or unprojected) reference systems use longitude and latitude for referencing a location on the Earth’s three-dimensional ellipsoid surface.

  • Projected coordinate reference systems use easting and northing Cartesian coordinates for referencing a location on a two-dimensional representation of the Earth.

Figure 7. Coordinate systems. Left: Geographic, Right: Projected.

All Projected CRSs are based on a Geographic CRS and rely on map projections to convert the three-dimensional surface of the Earth into Easting and Northing (x and y) values in a projected CRS.

Figure 8. Coordinate systems. Left: Geographic, Right: Projected.

Which of the following options of Coordinate Reference System (CRS) types:

  1. "geographic_crs"
  2. "projected_crs"

…corresponds to each of these datasets, given the magnitude of the values in their x and y columns:

the parana dataset?

parana <- import("https://github.com/cran/geoR/raw/master/data/parana.rda")
as_tibble(parana$coords)

the fatalities dataset?

pacman::p_load(cholera)
as_tibble(fatalities)

Set UTM Projected coordinates with st_as_sf()

First, we need to set UTM coordinates. For this, we specify the projection of The Gambia, that is, UTM zone 28 ("+proj=utm +zone=28") in the st_as_sf() function of the {sf} package.

gambia_projected <- gambia_point_summary %>% 
  # first, specify the projection of gambia
  # UTM zone 28
  st_as_sf(coords = c("x", "y"),
           crs = "+proj=utm +zone=28")

gambia_projected
## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 349631.3 ymin: 1458055 xmax: 622086.1 ymax: 1510610
## CRS:           +proj=utm +zone=28
## # A tibble: 65 × 4
##    total positive  prev           geometry
##  * <int>    <dbl> <dbl>        <POINT [m]>
##  1    33       17 0.515 (349631.3 1458055)
##  2    63       19 0.302 (358543.1 1460112)
##  3    17        7 0.412 (360308.1 1460026)
##  4    24        8 0.333 (363795.7 1496919)
##  5    26       10 0.385 (366400.5 1460248)
##  6    18        7 0.389 (366687.5 1463002)
##  7    38       24 0.632 (368420.5 1460569)
##  8    56        7 0.125 (370399.5 1460301)
##  9    57        6 0.105 (370627.7 1499589)
## 10    26        8 0.308 (374402.6 1501392)
## # … with 55 more rows

Confirm the presence of the:

  • CRS text (CRS: +proj=utm +zone=28) inside the header of the new sf object, and
  • the unit the geometry column in meters (<POINT [m]>).
  • The UTM system divides the Earth into 60 zones of 6 degrees of longitude in width. Each of the zones uses a transverse Mercator projection that maps a region of large north-south extent.

  • To get the UTM zones of various parts of the world, you could use online interactive maps, or gridded images available in wikipedia.

In the UTM system, a position on the Earth is given by the:

  • UTM zone number,
  • Hemisphere (north or south), and
  • Easting and northing coordinates in the zone which are measured in meters.
    • Eastings are referenced from the central meridian of each zone, and
    • northings are referenced from the equator.

parana_data contains the average rainfall over different years for the period May-June (dry-season). It was collected at 143 recording stations throughout Parana State, Brasil.

Set UTM coordinate system to the parana_data.

parana_data <- as_tibble(parana$coords) %>% 
  mutate(Rainfall = parana$data)

Transform to Geographic coordinates with st_transform()

We can transform the UTM projected coordinates to geographic coordinates (longitude/latitude with datum WGS84) using st_transform() where we set CRS to "+proj=longlat +datum=WGS84".

gambia_geographic <- gambia_projected %>% 
  # second, transform 
  # projected coordinates to
  # geographic coordinates
  st_transform(crs = "+proj=longlat +datum=WGS84")

gambia_geographic
## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -16.38755 ymin: 13.18541 xmax: -13.87273 ymax: 13.66438
## CRS:           +proj=longlat +datum=WGS84
## # A tibble: 65 × 4
##    total positive  prev             geometry
##  * <int>    <dbl> <dbl>          <POINT [°]>
##  1    33       17 0.515 (-16.38755 13.18541)
##  2    63       19 0.302 (-16.30543 13.20444)
##  3    17        7 0.412 (-16.28914 13.20374)
##  4    24        8 0.333 (-16.25869 13.53742)
##  5    26       10 0.385 (-16.23294 13.20603)
##  6    18        7 0.389 (-16.23041 13.23094)
##  7    38       24 0.632 (-16.21431 13.20902)
##  8    56        7 0.125 (-16.19604 13.20668)
##  9    57        6 0.105 (-16.19569 13.56187)
## 10    26        8 0.308 (-16.16088 13.57834)
## # … with 55 more rows

Confirm the update of the:

  • CRS text to CRS: +proj=longlat +datum=WGS84 inside the header, and
  • the units of the geometry column to degrees (<POINT [°]>).

A PROJ string includes the following information:

  • +proj=: the projection of the data (e.g. utm, longlat, or laea)
  • +zone=: the zone of the data, specific to the UTM projection (e.g. 28)
  • +datum=: the datum use (e.g. WGS84)
  • +units=: the units for the coordinates of the data (e.g. m)

With the UTM coordinate system data stored in q6:

Transform its Projected CRS to a Geographic CRS using the longitude/latitude projection with datum WGS84.

Map prevalences

Now that you have set up the right CRS projection to your data, you can overlap these points with other Vector data objects:

gambia_adm_2 <- geoboundaries(country = "Gambia", adm_lvl = 2)
ggplot() +
  geom_sf(data = gambia_adm_2) +
  geom_sf(data = gambia_geographic, mapping = aes(color = prev)) +
  colorspace::scale_color_continuous_sequential(palette="Reds 3")

Which CRS to use?

“There exist no all-purpose projections, all involve distortion when far from the center of the specified frame” (Bivand, Pebesma, and Gómez-Rubio 2013).

  • When Geographic CRS, the answer is often WGS84.
    • It is used by default for web mapping, in GPS datasets, and vector datasets.
    • WGS84 is the most common CRS in the world, so it is worth knowing its EPSG code: 4326.
    • This ‘magic number’ can be used to convert objects with unusual projected CRSs into something that is widely understood.
  • About Projected CRS,
    • It is often a choice made by a public mapping agency.
    • With local data sources, work with the CRS in which the data was provided, to ensure compatibility, even if the official CRS is not the most accurate.

7 Wrap up

In this lesson, we have learned how to manage a CRS projection in ggplot maps and sf objects, how projections are codified with EPSG codes and PROJ strings, and how to transform the CRS between different coordinate systems (projected and geographic).

In the next lesson, we are going to use all our previous learning to built one single thematic map by layers, and enrich them with text and labels referring to specific places or regions, and important map elements like scale bars and a north arrow!

Contributors

The following team members contributed to this lesson:

References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License